The following information contains data collected on the GDP of all countries and an independent international calculation of an absence of corruption index from a variety of factors including government transparency, public official use of state dollars for personal gain and general mistrust in government officials being honest in their actions between the years 1975 and 2021. Higher absence corruption index signals lower perceived corruption.
The following ER diagram represents how our data is structured
Code
# ER Diagramknitr::include_graphics(here::here("Figures", "ERDiagram.png"))
Variable
Type
Description
🔑Country
String
Full Name of a Country
🔑Year
Int
Year This data was collected
Country’s Absence of Corruption Index
Optional(Double)
A index measuring how corrupt a perceived country’s government is. Lower value means less trust and more corruption. NA means no data is unavailable for that year. Please check the section about NA values for some possible reasons
GDP Per Captia
Double
Calculated GDP per Captia (GDP/Population)
The two quantitative variables our team have decided to focus on are GDP per Capita (GDP/Population) and Absence of Corruption as an indexed range. These variables are measured for all 172 countries. Below are the descriptions of each and what they represent, specifically in our data set: GDP per Capita (GDP/Population): This variable represents a countries GDP or Gross Domestic Product for a given year divided by the countries population size. In other words, this variable is simply the per-person GDP for that (Ingabire 2020) country.Absence of Corruption Index: This variable represents the lack of corruption within a countries public administration.(Gapminder 2023) This indicator is determined by assessing public sector corrupt exchanges, public sector theft, executive embezzlement and theft, executive bribery and corrupt exchanges. These values are determined by assessing the extent that public officials within an office administration do not use their power for personal benefit or gain. The scaling of this index is such that a lower value represent a higher absence of corruption, or simply put less corruption. Conversely, a high value for this variable represents a lower absence of corruption, or simply put, more corruption. The categorical variable Country refers to the country that is being observed or referred to when assessing the corresponding values for GDP per Capita and Absence of Corruption. The quantitative variable Year, refers to the given year that corresponds to the other variables observations. Year refers to data collected from January to December for the given year.
One thing we do want to acknowledge is the presence of NAs in our datasheet, particularly in the corruption index. There could be a multitude of reasons why this that data isn’t present for a country in a particular year. Some examples are that the data was lost, data was measured or calculated incorrectly, some disaster prevented the data from being collected, there is an insufficient stability for determination to occur or that the government doesn’t want the public to know about their corruption (therefore attempting to censor it). All these factors, some randomized and some not, will be taken into our consideration when determining the relationship between corruption index and GDP per capita.
Possible Hypothesis
Explanatory Variable : Absence of Corruption of Index
Response Variable : GDP Per Capita
We believe that a higher absence of corruption index will lead to a higher GDP per capita. Traditionally, our more stable and democratic forms of governments have higher standards of living, less internal and or external conflicts, equal opportunities for all individuals, and large amounts of well fares for their constituency. Furthermore, in general, they have stronger economies that are more reliant to wild economic swings. For instance much of modern Europe has a high GDP per capita because of its modernization, democratic governments and tight integration with the EU.
Data Visualizations
Code
# Renames the year from xYYYY -> YYYYfull_data <- full_data |>mutate(Year =as.numeric(str_remove_all(Year, "^x")))
2021 Absence Of Corruption vs GDP per Capita
Below is a plot of countries and their respective Absence of Corruption Index vs. GDP Per Capita in the year 2021. At this point, we note that all of the plots from this point use 2021 as the year. This particular year is our basis, since it is has no NA values and is balanced. Furthermore, it is a recent year, fairly representative of the current times.
Code
full_data |>filter(Year ==2021) |>ggplot(mapping =aes(x = Absence_Corruption_Index, y = GDP_Per_Capita)) +geom_point() +geom_smooth(method ="lm") +labs(title ="Absence of Corruption vs GDP per Capita 2021", x ="Absence of Corruption", y ="GDP per Capita") +theme(axis.title.y =element_text(vjust =0.5, angle =0), plot.title =element_text(hjust =0.5))
Log Transformed 2021 Absence Of Corruption vs GDP per Capita
As many of our countries have quite poor GDP’s per capita, they are all bunched up on the bottom. This makes visualization quite difficult. Therefore, we will log transform our Y-axis to get a better view of this relationship
Code
# With Log Transformationfull_data |>filter(Year ==2021) |>mutate(GDP_Per_Capita =log(GDP_Per_Capita)) |>ggplot(mapping =aes(x = Absence_Corruption_Index, y = GDP_Per_Capita)) +geom_point() +geom_smooth(method ="lm") +labs(title ="Absence of Corruption vs GDP per Capita(Log Transformed)\n 2021", x ="Absence of Corruption", y ="Log of GDP per Capita") +theme(axis.title.y =element_text(vjust =0.5, angle =0), plot.title =element_text(hjust =0.5))
It looks like our original hypothesis seems to have at least some grounds. Lets see if such a trend has been present for the past 50 years (the extent in which we have data for absence of corruption index) or if this is just a trend that matriculated recently.
History of Absence Of Corruption vs GDP per Capita
Code
full_data |># Drops NA values cant be plotted...drop_na() |>ggplot(aes(x = Absence_Corruption_Index, y = GDP_Per_Capita, color = country)) +# Plots each country's datageom_point(alpha =0.7, show.legend =FALSE) +# Drops the colour guide wouldn't be helpful for 172 countries...guides(color ="none") +# Seperates by regionfacet_wrap(~region) +# Makes a regression line for easier viewinggeom_smooth(mapping =aes(x = Absence_Corruption_Index, y = GDP_Per_Capita, color ="blue"), method ="lm", se =FALSE) +# Tittle stufflabs(title ="Absence of Corruption vs GDP per Capita", subtitle ="Year: {frame_time}", x ="Absence of Corruption", y ="GDP Per Capita") +theme(axis.title.y =element_text(vjust =0.5, angle =0), plot.title =element_text(hjust =0.5), plot.subtitle =element_text(hjust =0.5)) +# Animating by yeartransition_time(as.integer(Year)) +ease_aes("linear")
The trend holds! However, only a true regression will tell us if there really is a correlation and how strong it may be.
Data Modeling & Prediction
Creating our Regression Models
While the log transformation gives us an easier way to see the upwards trend between Absence of Corruption Index and the nation’s GDP per capita, it actually does NOT help our linear model. In fact, comparing the computed R-squared values is worse on the log transformed model.
Code
# 2021 Datafull_2021_data <- full_data |>filter(Year ==2021)# 2021 Data with log transformationlog_2021_data <- full_2021_data |>mutate(GDP_Per_Capita =log(GDP_Per_Capita))
Code
full_2021_data_lm <-lm(GDP_Per_Capita ~ Absence_Corruption_Index, full_2021_data)log_2021_data_lm <-lm(GDP_Per_Capita ~ Absence_Corruption_Index, log_2021_data)tibble("R Squared Non Transformed"=summary(full_2021_data_lm)$r.squared, "R Squared Log Transformed"=summary(log_2021_data_lm)$r.squared) |>reactable() |>add_title("Comparison of R Squared of Two Linear Models", font_size =20, align ="center")
Comparison of R Squared of Two Linear Models
With that in mind, we will be sticking with our untransformed data for the remainder of this exploration.
Code
# Removes the transformed log modelsremove(log_2021_data, log_2021_data_lm)# Summary of Linear Model :broom::tidy(full_2021_data_lm) |>rename("Regression Variable"= term, "Model Estimate"= estimate, "Standard Error"= std.error) |>select(c("Regression Variable":"Standard Error"))
Regression Variable
Model Estimate
Standard Error
(Intercept)
-199747.304
31994.4920
Absence_Corruption_Index
8211.695
623.4122
Our linear model comparing different countries in 2021 and their respective Absence of Corruption Index vs their 2021 GDP Per Capita yields us the following results.
\[
\beta_0 = -199747.304\\
\beta_1 = 8211.695
\] which will give us the linear regression formula of : \[
\hat{y} = -199747.304+8211.695x
\]
or rewritten with our explanatory and response variables \[
\hat{GDP\_Per\_Captia} = -199747.304+8211.695(Absence\_Corruption\_Index)
\]
Analysis
With an \(R^2\) value of 0.50510, approximately 50% of our observed GDP per capita variability is accounted for in our correlation. For every score increase of 1 for a country’s Absence of Corruption Index, their GDP per capita will increase $8211.66.
Checking the Linear Model’s Accuracy
Although we have a mild \(R^2\) relation between Absence of Corruption Index and its response variable GDP per capita, we still want to check that the model fita. One way to check the average residual of our model is below.
With a average residual \(<|10^{-9}|\), this looks pretty promising. However, to confirm our calculations and better visualize the distributions of residuals, lets plot them on a histogram. What we expect is a normal distribution of all residuals reminiscent of the familiar bell curve.
---title: "Correlation of Country, Government, Index & GDP Per Capita"author: "Brian Kwong, Connor Gamba, Joey Bailitz, Arneh Begi"format: html: self-contained: true code-tools: true toc: true theme: Minty df-print: kableeditor: sourcecode-fold: trueexecute: error: true echo: true message: false warning: falsebibliography: references.bib---## Import the Needed LibrariesPlease make sure you have the following libraries installed:1. Tidyverse2. Here3. Janitor4. Styler5. Reactablefmtr6. gganimate7. gifski```{r}#| output: False# Set output scopeoptions(scipen =0)# Clears environmentremove(list =as.vector(ls()))#| output: Falselibrary(tidyverse)library(janitor)library(styler)library(reactablefmtr)library(gganimate)library(gifski)# Formats documents:style_file(path = here::here("Src", "Correlation_of_Country_Goverment_Index_&_GDP_Captia.qmd"), style = tidyverse_style, strict =TRUE)```# Introduction## BackgroundThe following information contains data collected on the GDP of all countries and an independent international calculation of an absence of corruption index from a variety of factors including government transparency, public official use of state dollars for personal gain and general mistrust in government officials being honest in their actions between the years 1975 and 2021. Higher absence corruption index signals ***lower*** perceived corruption.# Our Data## Importing the Data```{r}corruption_data <-read_csv(here::here("Datasets", "abscorrup_idea.csv"), show_col_types =FALSE) |> janitor::clean_names()gdp_per_cap_data <-read_csv(here::here("Datasets", "gdp_pcap.csv"), show_col_types =FALSE) |> janitor::clean_names()```## Matching Our DaasetSince our corruption only has data for years between \[1975,2021\] we filter our GDP per capita to those years too```{r}gdp_per_cap_data <- gdp_per_cap_data |>select(country, c(x1975:x2021))```## Piviot Data to Long Format```{r}#| warning: falsecorruption_data <- corruption_data |>pivot_longer(cols =c(x1975:x2021), names_to ="Year", values_to ="Absence_Corruption_Index") |>mutate(Absence_Corruption_Index =as.numeric(Absence_Corruption_Index))gdp_per_cap_data <- gdp_per_cap_data |>pivot_longer(cols =c(x1975:x2021), names_to ="Year", values_to ="GDP_Per_Capita") |>mutate(GDP_Per_Capita =if_else(str_detect(GDP_Per_Capita, "k$"), (as.numeric(str_remove_all(GDP_Per_Capita, "k$")) *10000), as.numeric(GDP_Per_Capita)))```## Data Joins```{r}full_data <- corruption_data |>inner_join(gdp_per_cap_data, by =c("country", "Year"))full_data <- full_data |>mutate(country =as_factor(country)) |>mutate(region =as.factor(case_when( country %in%c("Antigua and Barbuda", "Argentina", "Bahamas", "Barbados","Belize", "Bolivia", "Brazil", "Canada", "Chile", "Colombia","Costa Rica", "Cuba", "Dominica", "Dominican Republic","Ecuador", "El Salvador", "Grenada", "Guatemala", "Guyana","Haiti", "Honduras", "Jamaica", "Mexico", "Nicaragua", "Panama","Paraguay", "Peru", "St. Kitts and Nevis", "St. Lucia","St. Vincent and the Grenadines", "Suriname","Trinidad and Tobago", "USA", "Uruguay","Venezuela" ) ~"The Americas", country %in%c("Albania", "Andorra", "Armenia", "Austria", "Azerbaijan","Belarus", "Belgium", "Bosnia and Herzegovina", "Bulgaria","Croatia", "Cyprus", "Czech Republic", "Denmark", "Estonia","Finland", "France", "Georgia", "Germany", "Greece", "Holy See","Hungary", "Iceland", "Ireland", "Italy", "Latvia","Liechtenstein", "Lithuania", "Luxembourg", "North Macedonia","Malta", "Moldova", "Monaco", "Montenegro", "Netherlands","Norway", "Poland", "Portugal", "Romania", "Russia", "San Marino","Serbia", "Slovak Republic", "Slovenia", "Spain", "Sweden","Switzerland", "Turkey", "Ukraine","UK" ) ~"Europe", country %in%c("Algeria", "Angola", "Benin", "Botswana", "Burkina Faso","Burundi", "Cameroon", "Cape Verde", "Central African Republic","Chad", "Comoros", "Congo, Dem. Rep.", "Congo, Rep.","Cote d'Ivoire", "Djibouti", "Egypt", "Equatorial Guinea","Eritrea", "Ethiopia", "Gabon", "Gambia", "Ghana", "Guinea","Guinea-Bissau", "Kenya", "Lesotho", "Liberia", "Libya","Madagascar", "Malawi", "Mali", "Mauritania", "Mauritius","Morocco", "Mozambique", "Namibia", "Niger", "Nigeria", "Rwanda","Sao Tome and Principe", "Senegal", "Seychelles","Sierra Leone", "Somalia", "South Africa", "Sudan", "Eswatini","Tanzania", "Togo", "Tunisia", "Uganda", "Zambia", "Zimbabwe","South Sudan" ) ~"Africa", country %in%c("Afghanistan", "Australia", "Bahrain", "Bangladesh", "Bhutan","Brunei", "Cambodia", "China", "Fiji", "Hong Kong, China","India", "Indonesia", "Iran", "Iraq", "Israel", "Japan", "Jordan","Kazakhstan", "Kiribati", "North Korea", "South Korea", "Kuwait","Kyrgyz Republic", "Lao", "Lebanon", "Malaysia", "Maldives","Marshall Islands", "Micronesia, Fed. Sts.", "Mongolia","Myanmar", "Nauru", "Nepal", "New Zealand", "Oman", "Pakistan","Palau", "Papua New Guinea", "Philippines", "Qatar", "Samoa","Saudi Arabia", "Singapore", "Solomon Islands", "Sri Lanka","Syria", "Taiwan", "Tajikistan", "Thailand", "Timor-Leste","Tonga", "Turkmenistan", "Tuvalu", "UAE","Uzbekistan", "Vanuatu", "Palestine", "Vietnam","Yemen" ) ~"Asia-Pacific" )))```## Understanding Our DataThe following ER diagram represents how our data is structured```{r include-flow-map, fig.align='center'}# ER Diagramknitr::include_graphics(here::here("Figures", "ERDiagram.png"))```| Variable | Type | Description ||--|--|--|| 🔑Country | String | Full Name of a Country || 🔑Year | Int | Year This data was collected| Country's Absence of Corruption Index | Optional(Double) | A index measuring how corrupt a perceived country's government is. Lower value means less trust and more corruption. NA means no data is unavailable for that year. Please check the section about NA values for some possible reasons || GDP Per Captia | Double | Calculated GDP per Captia (GDP/Population)The two quantitative variables our team have decided to focus on are GDP per Capita (GDP/Population) and Absence of Corruption as an indexed range. These variables are measured for all 172 countries. Below are the descriptions of each and what they represent, specifically in our data set: GDP per Capita (GDP/Population): This variable represents a countries GDP or Gross Domestic Product for a given year divided by the countries population size. In other words, this variable is simply the per-person GDP for that [@ingabire_2020_gapminder] country.Absence of Corruption Index: This variable represents the lack of corruption within a countries public administration.[@gapminder_2023__ideademocracy] This indicator is determined by assessing public sector corrupt exchanges, public sector theft, executive embezzlement and theft, executive bribery and corrupt exchanges. These values are determined by assessing the extent that public officials within an office administration do not use their power for personal benefit or gain. The scaling of this index is such that a lower value represent a higher absence of corruption, or simply put less corruption. Conversely, a high value for this variable represents a lower absence of corruption, or simply put, more corruption. The categorical variable Country refers to the country that is being observed or referred to when assessing the corresponding values for GDP per Capita and Absence of Corruption. The quantitative variable Year, refers to the given year that corresponds to the other variables observations. Year refers to data collected from January to December for the given year.## Summary```{r}#| output: Falsefull_data |>select(country) |>summarise(Count =n_distinct(country))```There are a total of ***172*** countries in this studyBelow is a quick summary of the average mean absence corruption index and GDP per Capita out of every country for every year```{r}full_data |>group_by(country) |>mutate(across(.cols =c(Absence_Corruption_Index, GDP_Per_Capita), .fns =~mean(.x, na.rm =TRUE))) |>group_by(Year) |>summary() |>unclass() |>as_tibble() |>select(Absence_Corruption_Index, GDP_Per_Capita) |>drop_na() |>rename("Absence Corruption Index"= Absence_Corruption_Index, " GDP Per Capita"= GDP_Per_Capita)```## About those Pesky NAsOne thing we do want to acknowledge is the presence of NAs in our datasheet, particularly in the corruption index. There could be a multitude of reasons why this that data isn't present for a country in a particular year. Some examples are that the data was lost, data was measured or calculated incorrectly, some disaster prevented the data from being collected, there is an insufficient stability for determination to occur or that the government doesn't want the public to know about their corruption (therefore attempting to censor it). All these factors, some randomized and some not, will be taken into our consideration when determining the relationship between corruption index and GDP per capita.## Possible Hypothesis- Explanatory Variable : Absence of Corruption of Index- Response Variable : GDP Per CapitaWe believe that a higher absence of corruption index will lead to a higher GDP per capita. Traditionally, our more stable and democratic forms of governments have higher standards of living, less internal and or external conflicts, equal opportunities for all individuals, and large amounts of well fares for their constituency. Furthermore, in general, they have stronger economies that are more reliant to wild economic swings. For instance much of modern Europe has a high GDP per capita because of its modernization, democratic governments and tight integration with the EU.# Data Visualizations```{r}# Renames the year from xYYYY -> YYYYfull_data <- full_data |>mutate(Year =as.numeric(str_remove_all(Year, "^x")))```## 2021 Absence Of Corruption vs GDP per CapitaBelow is a plot of countries and their respective Absence of Corruption Index vs. GDP Per Capita in the year 2021. At this point, we note that all of the plots from this point use 2021 as the year. This particular year is our basis, since it is has no NA values and is balanced. Furthermore, it is a recent year, fairly representative of the current times.```{r}full_data |>filter(Year ==2021) |>ggplot(mapping =aes(x = Absence_Corruption_Index, y = GDP_Per_Capita)) +geom_point() +geom_smooth(method ="lm") +labs(title ="Absence of Corruption vs GDP per Capita 2021", x ="Absence of Corruption", y ="GDP per Capita") +theme(axis.title.y =element_text(vjust =0.5, angle =0), plot.title =element_text(hjust =0.5))```## Log Transformed 2021 Absence Of Corruption vs GDP per CapitaAs many of our countries have quite poor GDP's per capita, they are all bunched up on the bottom. This makes visualization quite difficult. Therefore, we will log transform our Y-axis to get a better view of this relationship```{r}# With Log Transformationfull_data |>filter(Year ==2021) |>mutate(GDP_Per_Capita =log(GDP_Per_Capita)) |>ggplot(mapping =aes(x = Absence_Corruption_Index, y = GDP_Per_Capita)) +geom_point() +geom_smooth(method ="lm") +labs(title ="Absence of Corruption vs GDP per Capita(Log Transformed)\n 2021", x ="Absence of Corruption", y ="Log of GDP per Capita") +theme(axis.title.y =element_text(vjust =0.5, angle =0), plot.title =element_text(hjust =0.5))```It looks like our original hypothesis seems to have at least some grounds. Lets see if such a trend has been present for the past 50 years (the extent in which we have data for absence of corruption index) or if this is just a trend that matriculated recently.## History of Absence Of Corruption vs GDP per Capita```{r}full_data |># Drops NA values cant be plotted...drop_na() |>ggplot(aes(x = Absence_Corruption_Index, y = GDP_Per_Capita, color = country)) +# Plots each country's datageom_point(alpha =0.7, show.legend =FALSE) +# Drops the colour guide wouldn't be helpful for 172 countries...guides(color ="none") +# Seperates by regionfacet_wrap(~region) +# Makes a regression line for easier viewinggeom_smooth(mapping =aes(x = Absence_Corruption_Index, y = GDP_Per_Capita, color ="blue"), method ="lm", se =FALSE) +# Tittle stufflabs(title ="Absence of Corruption vs GDP per Capita", subtitle ="Year: {frame_time}", x ="Absence of Corruption", y ="GDP Per Capita") +theme(axis.title.y =element_text(vjust =0.5, angle =0), plot.title =element_text(hjust =0.5), plot.subtitle =element_text(hjust =0.5)) +# Animating by yeartransition_time(as.integer(Year)) +ease_aes("linear")```The trend holds! However, only a true regression will tell us if there really is a correlation and how strong it may be.# Data Modeling & Prediction## Creating our Regression ModelsWhile the log transformation gives us an easier way to see the upwards trend between Absence of Corruption Index and the nation's GDP per capita, it actually does ***NOT*** help our linear model. In fact, comparing the computed R-squared values is ***worse*** on the log transformed model.```{r}# 2021 Datafull_2021_data <- full_data |>filter(Year ==2021)# 2021 Data with log transformationlog_2021_data <- full_2021_data |>mutate(GDP_Per_Capita =log(GDP_Per_Capita))``````{r}full_2021_data_lm <-lm(GDP_Per_Capita ~ Absence_Corruption_Index, full_2021_data)log_2021_data_lm <-lm(GDP_Per_Capita ~ Absence_Corruption_Index, log_2021_data)tibble("R Squared Non Transformed"=summary(full_2021_data_lm)$r.squared, "R Squared Log Transformed"=summary(log_2021_data_lm)$r.squared) |>reactable() |>add_title("Comparison of R Squared of Two Linear Models", font_size =20, align ="center")```With that in mind, we will be sticking with our untransformed data for the remainder of this exploration.```{r}# Removes the transformed log modelsremove(log_2021_data, log_2021_data_lm)# Summary of Linear Model :broom::tidy(full_2021_data_lm) |>rename("Regression Variable"= term, "Model Estimate"= estimate, "Standard Error"= std.error) |>select(c("Regression Variable":"Standard Error"))```Our linear model comparing different countries in 2021 and their respective Absence of Corruption Index vs their 2021 GDP Per Capita yields us the following results.$$\beta_0 = -199747.304\\\beta_1 = 8211.695$$ which will give us the linear regression formula of : $$\hat{y} = -199747.304+8211.695x$$or rewritten with our explanatory and response variables $$\hat{GDP\_Per\_Captia} = -199747.304+8211.695(Absence\_Corruption\_Index)$$## AnalysisWith an $R^2$ value of 0.50510, approximately 50% of our observed GDP per capita variability is accounted for in our correlation. For every score increase of 1 for a country's Absence of Corruption Index, their GDP per capita will increase $8211.66.## Checking the Linear Model's AccuracyAlthough we have a mild $R^2$ relation between Absence of Corruption Index and its response variable GDP per capita, we still want to check that the model fita. One way to check the average residual of our model is below.```{r}full_2021_data_lm_resd <- full_2021_data_lm |> broom::augment() |>select(.resid)full_2021_data_lm_resd |>summarize("Avergae Residual"=mean(.resid)) |>reactable() |>add_title("Model Residual Stats", font_size =20, align ="center")```With a average residual $<|10^{-9}|$, this looks pretty promising. However, to confirm our calculations and better visualize the distributions of residuals, lets plot them on a histogram. What we expect is a normal distribution of all residuals reminiscent of the familiar bell curve.```{r}options(scipen =100)full_2021_data_lm_resd |>ggplot(mapping =aes(x = .resid)) +geom_histogram(aes(y =after_stat(density)), bins =50) +labs(x ="Residuals", y ="Percentage of Distrubution") +geom_density(colour ="steelblue") +theme(axis.title.y =element_text(angle =0, vjust =0.5))options(scipen =0)```\`## Simulations and Model```{r}observed_2021_data <- full_2021_data |>select(GDP_Per_Capita) |>mutate(GDP_Per_Capita =as.numeric(GDP_Per_Capita)) |>pull()predicted_2021_data <- full_2021_data_lm |>predict()``````{r}generateRealisticPredictions <-function(predictedValues, standardError) { mutatedPredictedValues <- predictedValues +rnorm(length(predictedValues), 0, standardError)return(mutatedPredictedValues)}findModelFit <-function(linearModel) { newPredicted <- linearModel |> broom::augment() newPredicted <- newPredicted |>select(GDP_Per_Capita, .fitted) |>mutate(.fitted =generateRealisticPredictions(.fitted, sigma(linearModel))) observedPredictedLm <-lm(GDP_Per_Capita ~ .fitted, newPredicted)return(broom::glance(observedPredictedLm) |>select(r.squared) |>pull())}``````{r}set.seed(05081995)sim_model_res <-map_dbl(.x =c(1:100), .f =~findModelFit(full_2021_data_lm))summary(sim_model_res) |>unclass() |>as.list() |>as.data.frame() |>pivot_longer(cols =c("Min.":"Max."), names_to ="Stat", values_to ="Values") |>mutate(Stat =str_remove(Stat, "[X]")) |>mutate(Stat =str_replace(Stat, "\\.", " ")) |>reactable() |>add_title("Model Residual Simulations Stats", font_size =20, align ="center")```Plotting the $R^2$ on a histogram we get ```{r,fig.width=9}options(scipen = 100)sim_model_res |> as.data.frame() |> ggplot(mapping = aes(x = sim_model_res)) + geom_histogram(aes(y = after_stat(density)), bins = 50) + ggtitle(bquote("Distrubution of " ~ R^2 ~ "between Normalized Simulated Data and Observed Data")) + labs(x = bquote(R^2), y = "Percentage of\n Distrubution") + geom_density(colour = "steelblue") + theme(axis.title.y = element_text(angle = 0, vjust = 0.5), plot.title = element_text(hjust = 0.5))```